SmartZoning® Documentation
Leveraging Permitting and Zoning Data to Predict Upzoning Pressure, Philadelphia
This model and web application prototype were developed for MUSA508, a Master of Urban Spatial Analytics class focused on predictive public policy analytics at the University of Pennsylvania.
1 Objective
Growth is critical for a city to continue to densify and modernize. The benefits of growth range from increased public transit use to updating the built environment to be more climate resilient. Growth fuels development and vice versa. In Philadelphia, the US’s 6th largest city that ranks 42nd in cost of living, growth is met with concern. Many residents and preservationists ask: Will growth deteriorate the city’s best features? Will modernization make the city unaffordable to longtime residents?
Balancing growth with affordability is a precarious task for Philadelphia. To date, politicians favor making exceptions for developers parcel by parcel rather than championing a smart growth citywide strategy. Zoning advocates need better data-driven tools to broadcast the benefits of a smart growth approach, a planning framework that aims to maximize walkability and transit use to avoid sprawl, that also demonstrates how parcel-by-parcel, or spot zoning, creates unmet development pressure that can drive costs. SmartZoning is a prototype web tool that identifies parcels under development pressure with conflicting zoning. Users can strategically leverage the tool to promote proactive upzoning of high-priority parcels, aligning current zoning more closely with anticipated development. This approach aims to foster affordable housing in Philadelphia, addressing one of the city’s most pressing challenges.
Smart Growth meets SmartZoning
2 Introduction
The following documentation details the development of a predictive model, which has demonstrated remarkable effectiveness in predicting future development patterns with a low mean absolute error. By accurately forecasting where growth is likely to occur using past permitting data against where current zoning may hinder growth, this model serves as a critical backbone to SmartZoning’s functionality. The study also considers the relationship between development pressure with race, income, and housing cost burden to strengthen the predictive model and investigate the impacts of development locally and city-wide.
3 Select and Engineer Features
This study leverages open data sources including permit counts, council district boundaries, racial mix, median income, housing cost burden to holistically understand what drives development pressure. Generally, data is collected at the block group or parcel level and aggregated up to the council district to capture both local and more citywide trends.
| Dataset | Source | Geo Level |
|---|---|---|
| Construction Permits | Philadelphia Dept. of Licenses & Inspections | Parcel |
| Zoning Base Map | Planning Commission | Parcel |
| Zoning Overlays | Planning Commission | Parcel |
| Demographic and Socioeconomic Data | U.S. Census Bureau’s ACS 5-Y | Block Group |
| Council District Boundaries and Leadership | City of Philadelphia | Parcel |
3.1 Permits
Firstly, 10 years of permit data from 2012 to 2023 from the Philadelphia Department of Licenses and Inspections are critical to the study. This study filters only for new construction permits granted for residential projects. In the future, filtering for full and substantial renovations could add more nuance to what constitutes as development pressure.
The spike in new construction permits in 2021 is reasonably attributed to the expiration of a tax abatement program for developers.
When assessing new construction permit count by Council Districts, a few districts issued the bulk of new permits during that 2021 peak. Hover over the lines to see more about the volume of permits and who granted them.
3.1.1 Feature Engineering by Time and Space
To better understand the relationship between time-space lag and permit count,… Notably…
3.2 Socioeconomics
Racial Mix (white vs non-white), median income, and housing cost burden are socioeconomic factors that often play an outsized role in affordability in cities like Philadelphia, with a pervasive and persistent history of housing discrimination and systemic disinvestment. This data is all pulled from the US Census Bureau’s American Community Survey 5-Year survey.
Spatially, is clear that non-white communities earn lower median incomes and experience higher rates of extreme rent burden (household spends more than 35% of income on gross rent).
Considering the strong spatial relationship between socioeconomics and certain areas of Philadelphia, we will be sure to investigate our model’s generalizability against race and income.
4 Build Predictive Models
“All the complaints about City zoning regulations really boil down to the fact that City Council has suppressed infill housing or restricted multi-family uses, which has served to push average housing costs higher.” - Jon Geeting, Philly 3.0 Engagement Director
SmartZoning® seeks to predict where permits are most likely to be filed as a measure to predict urban growth. As discussed, predicting growth is fraught because growth is influenced by political forces rather than by plans published by the city’s Planning Commission. Comprehensive plans, typically set on ten-year timelines, tend to become mere suggestions, ultimately subject to the prerogatives of city council members rather than serving as steadfast guides for smart growth. With these dynamics in mind, SmartZoning’s prediction model accounts for socioeconomics, council district, and time-space lag.
4.1 Tests for Correlation
The goal is to select variables that most significantly correlate to permit count to include in the predictive model. Correlation is a type of association test. For example, are permit counts more closely associated to population or to median income? Or, do racial mix and rent burden offer redundant insight? These are the types of subtle but important distinctions we aim to seek out.
4.1.1 Correlation Coefficients
4.1.2 VIF
| gvif | gvif_1_2_df | |
|---|---|---|
| district | 1175997.928011 | 2.173926 |
| hist_dist_na | 33.120644 | 5.755054 |
| hist_dist_historic_street_paving_thematic_district | 26.735386 | 5.170627 |
| overlay_fne | 15.601603 | 3.949886 |
| overlay_ne | 11.179327 | 3.343550 |
| overlay_nis | 7.717070 | 2.777961 |
| overlay_ndo | 6.867713 | 2.620632 |
| overlay_fdo | 6.574022 | 2.563985 |
| overlay_edo | 5.595256 | 2.365429 |
| overlay_vdo | 5.400210 | 2.323835 |
| dist_to_transit | 4.378395 | 2.092462 |
| lag_spat_4_years | 3.946301 | 1.986530 |
| lag_spat_2_years | 3.767694 | 1.941055 |
| lag_spat_5_years | 3.722626 | 1.929411 |
| lag_spat_6_years | 3.689699 | 1.920859 |
| lag_spat_3_years | 3.620857 | 1.902855 |
| lag_spat_7_years | 3.581870 | 1.892583 |
| lag_spat_8_years | 3.530726 | 1.879023 |
| percent_nonwhite | 3.147111 | 1.774010 |
| lag_spat_1_year | 2.943444 | 1.715647 |
| overlay_ctr | 2.938630 | 1.714243 |
| med_inc | 2.826465 | 1.681210 |
| lag_spat_9_years | 2.645426 | 1.626477 |
| hist_dist_rittenhouse_fitler_residential | 2.310139 | 1.519914 |
| overlay_ahc | 2.264476 | 1.504818 |
| lag_4_years | 2.260224 | 1.503404 |
| overlay_min | 2.184365 | 1.477960 |
| hist_dist_spring_garden | 2.178088 | 1.475835 |
| hist_dist_diamond_street | 2.126102 | 1.458116 |
| lag_5_years | 2.040912 | 1.428605 |
| lag_2_years | 2.012640 | 1.418676 |
| overlay_ncp | 1.990799 | 1.410957 |
| hist_dist_girard_estate | 1.933372 | 1.390457 |
| overlay_wwo | 1.931780 | 1.389885 |
| hist_dist_overbrook_farms_historic_district | 1.929129 | 1.388931 |
| lag_6_years | 1.896103 | 1.376991 |
| lag_7_years | 1.878892 | 1.370727 |
| lag_8_years | 1.873276 | 1.368677 |
| overlay_other | 1.865672 | 1.365896 |
| hist_dist_old_city | 1.839962 | 1.356452 |
| lag_3_years | 1.836923 | 1.355331 |
| overlay_eco | 1.800232 | 1.341727 |
| overlay_nca | 1.717870 | 1.310675 |
| hist_dist_awbury_arboretum | 1.690225 | 1.300086 |
| hist_dist_park_mall_temple_universitys_campus | 1.683083 | 1.297337 |
| lag_1_year | 1.680436 | 1.296316 |
| dist_to_2022 | 1.654760 | 1.286375 |
| lag_9_years | 1.646576 | 1.283190 |
| overlay_wst | 1.616137 | 1.271274 |
| overlay_nbo | 1.611118 | 1.269298 |
| percent_renters | 1.602037 | 1.265716 |
| overlay_tso | 1.462620 | 1.209388 |
| overlay_ima | 1.453865 | 1.205763 |
| ext_rent_burden | 1.444448 | 1.201852 |
| overlay_yod | 1.381172 | 1.175233 |
| overlay_hhc | 1.379881 | 1.174683 |
| overlay_nco | 1.379048 | 1.174329 |
| hist_dist_parkside | 1.375646 | 1.172880 |
| hist_dist_society_hill | 1.348453 | 1.161229 |
| overlay_tod | 1.345925 | 1.160140 |
| overlay_drc | 1.313695 | 1.146165 |
| overlay_gao | 1.307361 | 1.143399 |
| overlay_wwa | 1.299857 | 1.140113 |
| overlay_ued | 1.299239 | 1.139842 |
| overlay_na | 1.287432 | 1.134651 |
| hist_dist_tudor_east_falls | 1.285346 | 1.133731 |
| overlay_eod | 1.283286 | 1.132822 |
| rent_burden | 1.268730 | 1.126379 |
| overlay_cdo | 1.266165 | 1.125240 |
| hist_dist_league_island_park_aka_f_d_r_park | 1.239856 | 1.113488 |
| hist_dist_manayunk_main_street_historic_district | 1.232682 | 1.110262 |
| total_pop | 1.215123 | 1.102326 |
| overlay_nho | 1.194565 | 1.092961 |
| overlay_cgc | 1.189653 | 1.090712 |
| hist_dist_greenbelt_knoll | 1.184422 | 1.088311 |
| overlay_snm | 1.181435 | 1.086938 |
| overlay_cao | 1.179384 | 1.085995 |
| overlay_ame | 1.156756 | 1.075526 |
| overlay_ahp | 1.149727 | 1.072253 |
| overlay_stm | 1.137920 | 1.066734 |
| overlay_smh | 1.135161 | 1.065439 |
| overlay_env | 1.035448 | 1.017570 |
| overlay_wah | 1.031113 | 1.015437 |
| hist_dist_420_row | 1.025286 | 1.012564 |
| hist_dist_east_logan_street | 1.022054 | 1.010967 |
Notably, permit count does not have a particularly strong correlation to any of our selected variables. This may lead one to the conclusion that permits are evenly distributed throughout the city. However, as we can see below, there are few block groups with more 50 permits. This indicates that permits are granted on a block by block across all districts. The need for SmartZoning is applicable for most Philadelphia neighborhoods, not just a select few.
4.2 Examine Spatial Patterns
To to identify spatial clusters, or hotspots, in geographic data, we performed a Local Moran’s I test. It assesses the degree of spatial autocorrelation, which is the extent to which the permit counts in a block group tend to be similar to neighboring block group. We used a p-value of 0.1 as our hotspot threshold.
Emergeging hotspots…? If I can get it to work.
4.3 Compare Models
Make sure to note that we train, test, and then validate. So these first models are based on 2022 data, and then we run another on 2023 (and then predict 2024 at the end).
There are various regression models available, each with its assumptions, strengths, and weaknesses. We compared Ordinary Least Square, Poisson, and Random Forest. This comparative study allowed us to consider the model’s accuracy, if it overfit, its generalizability, as well as compuationl efficiency.
The Poisson model was unviable because it overvalued outliers and therefore is not detailed below.
4.3.1 OLS
OLS (Ordinary least squares) is a method to explore relationships between a dependent variable and one or more explanatory variables. It considers the strength and direction of these relationships and the goodness of model fit. Our model incorporates three engineered groups of features: space lag, time lag, and distance to 2022. We include this last variable because of the Philadelphia tax abatement policy that led to a significant increase in residential development in the years immediately before 2022 discussed earlier. We used this as a baseline model to compare to Poisson and Random Forest. Given how tightly aligned the observed and predicted prices are we performed dozens of variable combinations to rule out over fitting. We are confident that our variables are generalizable and do not over-fit.
Our OLS model exhibits a Mean Absolute Error (MAE) of 2.66, a decent performance for a model of its simplicity. However, its efficacy is notably diminished in critical domains where optimization is imperative. Consequently, we intend to enhance the predictive capacity by incorporating more pertinent variables and employing a more sophisticated modeling approach.
We find that our OLS model has an MAE of only MAE: 2.66–not bad for such a simple model! Still, it struggles most in the areas where we most need it to succeed, so we will try to introduce better variables and apply a more complex model to improve our predictions.
4.3.2 Random Forest
OLS and Random Forest represent different modeling paradigms. OLS is a linear regression model suitable for capturing linear relationships, while Random Forest is an ensemble method capable of capturing non-linear patterns and offering greater flexibility in handling various data scenarios. Considering, Random Forest is generally less sensitive to multicollinearity because it considers subsets of features in each tree and averages their predictions and because the effect of outliers tends to be mitigated, we decided it worth investigating Random Forest as an alternative model.
Compared to the OLS model, the relationship between predicted vs actual permits…
Compared to the OLS Model, the Random Forest Model has a similar error distribution however, it exhibits a MAE of….
5 Model Validation
Considering Random Forest’s favorable results and attributes for our study compared to OLS, we will train and test our predictive model using the random forest model.
We decided to split our training and testing data up to 2022 in an effort to balance permiting activity pre- and post- tax abatement policy.
[code block here]
We train and test up to 2022–we use this for model tuning and feature engineering.
Having settled on our model features and tuning, we now validate on 2023 data.
We return an MAE of MAE: 2.2.
6 Discussion
6.1 Accuracy
Predominately, our model overpredicts, which is better than underpredicting, as it facilitates new development.
6.2 Generalizabiltiy
The constructed boxplot, categorizing observations based on racial composition, indicates that the random forest model generalizes effectively, showcasing consistent and relatively low absolute errors across majority non-white and majority white categories. The discernible similarity in error distributions suggests that the model’s predictive performance remains robust across diverse racial compositions, affirming its ability to generalize successfully.
We find that error is not related to affordability and actually trends downward with percent nonwhite. (This is probably because there is less total development happening there in majority-minority neighborhoods to begin with, so the magnitude of error is less, even though proportionally it might be more.) Error increases slightly with total pop. This makes sense–more people –> more development.
Our analysis reveals that the error is not correlated with affordability and demonstrates a downward trend in conjunction with the percentage of the nonwhite population. This observed pattern may be attributed to the likelihood that majority-minority neighborhoods experience a comparatively lower volume of overall development, thereby diminishing the absolute magnitude of error, despite potential proportional increases. Additionally, there is a slight increase in error with the total population, aligning with the intuitive expectation that higher population figures correspond to more extensive development activities.
How does this generalize across council districts? Don’t forget to refactor
7 Assessing Upzoning Pressure
We can identify conflict between projected development and current zoning.
Look at zoning that is industrial or residential single family in areas that our model suggests are high development risk for 2023:
We can extract development predictions at the block level to these parcels and then visualize them by highest need.
Furthermore, we can identify properties with high potential for assemblage, which suggests the ability to accomodate high-density, multi-unit housing.
| rf_val_preds | n_contig | OBJECTID | CODE | |
|---|---|---|---|---|
| 868 | 27.06613 | 3 | 1615 | ICMX |
| 1548 | 27.06613 | 3 | 2736 | IRMX |
| 1587 | 27.06613 | 3 | 2804 | IRMX |
| 3420 | 27.06613 | 3 | 6405 | RSA5 |
| 4667 | 27.06613 | 3 | 9661 | RSA5 |
| 9169 | 27.06613 | 4 | 20073 | ICMX |
| 1768 | 22.24860 | 3 | 3128 | IRMX |
| 3640 | 22.24860 | 3 | 6901 | ICMX |
| 7517 | 21.08143 | 3 | 16717 | RSA5 |
| 3934 | 20.84390 | 3 | 7646 | ICMX |
| 12326 | 20.84390 | 4 | 25776 | RSA5 |
| 4957 | 20.67827 | 3 | 10410 | ICMX |
| 4958 | 20.67827 | 3 | 10411 | RSA5 |
| 4959 | 20.67827 | 3 | 10412 | ICMX |
| 5245 | 20.67827 | 3 | 11160 | RSA5 |
| 4460 | 17.31087 | 3 | 9093 | RSA5 |
| 7726 | 15.16207 | 3 | 17168 | ICMX |
| 13578 | 15.12210 | 3 | 27869 | IRMX |
| 5088 | 15.00973 | 3 | 10759 | IRMX |
| 4512 | 14.95163 | 5 | 9243 | IRMX |
| 6014 | 14.95163 | 6 | 13057 | ICMX |
| 3041 | 12.82333 | 3 | 5568 | ICMX |
| 9842 | 12.82333 | 3 | 21369 | RSA5 |
| 9843 | 12.82333 | 3 | 21370 | ICMX |
| 9845 | 12.82333 | 3 | 21372 | RSA5 |
| 7833 | 12.25843 | 3 | 17408 | RSA5 |
| 3957 | 11.49060 | 3 | 7704 | IRMX |
| 6645 | 11.22550 | 3 | 14648 | ICMX |
| 7280 | 11.22550 | 3 | 16179 | RSA5 |
| 9912 | 11.22550 | 3 | 21527 | ICMX |
| 2138 | 10.88253 | 4 | 3744 | IRMX |
| 8143 | 10.71940 | 3 | 18031 | RSD3 |
| 8656 | 10.71940 | 3 | 19076 | RSA3 |
| 9409 | 10.71940 | 4 | 20534 | RSA2 |
| 10175 | 10.71940 | 3 | 22002 | RSD1 |
| 12605 | 10.71940 | 3 | 26247 | RSD1 |
| 4146 | 10.39053 | 3 | 8265 | IRMX |
| 5108 | 10.39053 | 4 | 10795 | IRMX |
8 2024 Predictions
9 Web Application
10 Next Steps
11 Appendices